O(1) scheduler: Complete implementation #38

vicLin8712 · 2025-11-19T07:02:27Z

O(1) scheduler: Complete implementation

This PR provides the complete O(1) scheduler implementation and serves as the final part of the 3-part stacked PR series.
It integrates all components introduced in the earlier patches and replaces the legacy O(n) linear scheduler with the new ready-queue–based, RR-cursor-based, bitmap-assisted O(1) design.

Feature of O(1) scheduler

Priority-indexed ready queues
Each priority level maintains an independent ready queue.
Bitmap + De Bruijn–based highest-priority lookup
The scheduler locates the next runnable task in constant time using priority bitmaps and De Bruijn table lookup.
RR cursor for fair round-robin scheduling
Each priority queue maintains a cursor to provide O(1) fair scheduling among tasks of the same priority.
Full integration into the scheduler execution path
The legacy O(n) priority scanning algorithm is completely replaced by the new O(1) logic; the iteration limit IMAX=500 is removed.
Idle task fully integrated into new design
System execution starts in the idle task, which serves as the initial execution context.
Whenever the idle task yields, control deterministically transitions to the highest-priority runnable task.

Unit tests

Commit: Add unit test suite for RR-cursor scheduler

make sched_cmp
make run

Approach

A dedicated controller task is created with priority TASK_PRIO_CRIT to orchestrate the entire test process and enforce deterministic sequencing.
After each state change of the test tasks, the unit test verifies both bitmap correctness and per-priority task-count consistency, ensuring alignment with the ready-queue and priority-bitmask invariants maintained by the O(1) scheduler.
Task types
Controller task: Coordinates the test flow, triggers all state transitions, and validates ready-queue invariants after each step.
Delay task: A runnable task that transitions into TASK_BLOCKED through mo_task_delay().
Used to verify dequeue behavior and correct clearing of priority bits when a task leaves the schedulable state set.
Normal task: A simple infinite-loop runnable task that remains schedulable unless externally suspended or cancelled.
Serves as the primary subject for testing state transitions and enqueue/dequeue correctness.

Verified state points
The following state transitions are validated by checking both ready-queue task counts and bitmap updates after each operation:

Normal task state transitions
- Creation (TASK_READY) – initial enqueue and priority bit set.
- Priority change – priority migration updates to queue placement and the corresponding bitmap bit.
- Suspension (TASK_READY → TASK_SUSPEND) – dequeued from the ready queue and priority bit cleared.
- Resumption (TASK_SUSPEND → TASK_READY) – re-enqueued with correct priority placement.
- Cancellation (TASK_READY → TASK_CANCELLED) – removed from ready queues and all bitmap bits fully cleared.
Blocked task behavior (TASK_RUNNING → TASK_BLOCKED)
- The delay task is created and its priority is promoted to match the controller task’s priority (TASK_READY).
- After the controller yields, the delay task becomes the running task, invokes mo_task_delay(), and transitions to TASK_BLOCKED.
- Control returns to the controller task, and the test verifies:
  - the delay task is completely removed from the ready queue
  - its priority bit is cleared from the bitmap
  - scheduler selection falls back to the highest remaining runnable task

Results

Linmo kernel is starting...
Heap initialized, 130005992 bytes available
idle id 1: entry=80001900 stack=80004488 size=4096
task 2: entry=80000788 stack=80005508 size=4096 prio_level=4 time_slice=5
Scheduler mode: Preemptive
Starting RR-cursor based scheduler test suits...

=== Testing Bitmap and Task Count Consistency ===
task 3: entry=80000168 stack=80006634 size=4096 prio_level=4 time_slice=5
PASS: Bitmap is consistent when TASK_READY
PASS: Task count is consistent when TASK_READY
PASS: Bitmap is consistent when priority migration
PASS: Task count is consistent when priority migration
PASS: Bitmap is consistent when TASK_SUSPENDED
PASS: Task count is consistent when TASK_SUSPENDED
PASS: Bitmap is consistent when TASK_READY from TASK_SUSPENDED
PASS: Task count is consistent when TASK_READY from TASK_SUSPENDED
PASS: Bitmap is consistent when task canceled
PASS: Task count is consistent when task canceled
task 4: entry=80000178 stack=80006634 size=4096 prio_level=4 time_slice=5
PASS: Task count is consistent when task canceled
PASS: Task count is consistent when task blocked

=== Test Results ===
Tests passed: 12
Tests failed: 0
Total tests: 12
All tests PASSED!
RR-cursor based scheduler tests completed successfully.

Note

The term TASK_CANCELLED in this document is used only for explanation. It is not an actual state in the task state machine, but represents the condition where a task has been removed from all scheduling structures and no longer exists in the system.
The task states shown in parentheses (e.g., (TASK_READY)) refer to the state of the test tasks being created or manipulated, not the state of the controller task.

Benchmark

Commit: Add benchmarking files

python3 bench.py

Approach

Spawn N=500 normal tasks to populate the scheduling domain.
All tasks begin in the TASK_READY state, ensuring the ready queues and bitmap are fully populated.
Scenario configuration (active ratio)
For each benchmark scenario, suspend a portion of tasks to reach the desired active-ratio load:

2% active
4% active
20% active
50% active
100% active

Benchmark execution
To compare the legacy O(n) scheduler with the new O(1) scheduler, a compile-time flag OLD is passed to select which scheduling algorithm is active.
The original linear-search scheduler is preserved in task.c for baseline measurement.
For each benchmark scenario, the scheduler is executed 20 times to obtain stable timing data.
The average and maximum scheduling latencies are collected, and the performance improvement is computed as the ratio between the old and new scheduler times (e.g., 1.5× faster).
Metrics collected
The benchmark collects the following metrics for each scenario:
- Mean improvement
  Average speedup factor computed as (old_latency / new_latency) across 20 runs.
- Standard deviation of improvement
  Measures the variability of speedup across repeated runs.
- Minimum / maximum improvement
  Best and worst observed speedup factors among the 20 runs.
- 95% confidence interval (CI)
  Statistical confidence bounds for the mean improvement.
- Mean scheduling latency (old / new)
  Average schedule-selection time for both the legacy O(n) scheduler and the new O(1) scheduler.
- Maximum scheduling latency (old / new)
  Worst-case schedule-selection time observed for each scheduler.
  Results

Scenario 'Minimal Active':                                                                                                                                                                                                 
  mean improvement        = 2.68x faster                                                                                                                                                                                   
  std dev of improvement  = 0.34x                                                                                                                                                                                          
  min / max improvement   = 1.75x  /  3.35x                                                                                                                                                                                
  95% CI of improvement   = [2.54x, 2.83x]                                                                                                                                                                                 
  mean old sched time     = 5616.25 us                                                                                                                                                                                     
  mean new sched time     = 2119.0 us                                                                                                                                                                                      
  max  old sched time     = 47.0 us 
  max  new sched time     = 37.0 us 

Scenario 'Moderate Active':
  mean improvement        = 1.80x faster
  std dev of improvement  = 0.27x
  min / max improvement   = 1.27x  /  2.51x
  95% CI of improvement   = [1.68x, 1.92x]
  mean old sched time     = 3887.6 us 
  mean new sched time     = 2179.45 us 
  max  old sched time     = 40.0 us 
  max  new sched time     = 23.0 us 

Scenario 'Heavy Active':
  mean improvement        = 1.02x faster
  std dev of improvement  = 0.08x
  min / max improvement   = 0.84x  /  1.17x
  95% CI of improvement   = [0.98x, 1.06x]
  mean old sched time     = 2150.15 us 
  mean new sched time     = 2119.1 us 
  max  old sched time     = 73.0 us 
  max  new sched time     = 33.0 us 

Scenario 'Stress Test':
  mean improvement        = 0.93x (slower than OLD)
  std dev of improvement  = 0.11x
  min / max improvement   = 0.65x  /  1.20x
  95% CI of improvement   = [0.88x, 0.98x]
  mean old sched time     = 1874.35 us 
  mean new sched time     = 2032.55 us 
  max  old sched time     = 23.0 us 
  max  new sched time     = 20.0 us 

Scenario 'Full Load Test':
  mean improvement        = 0.89x (slower than OLD)
  std dev of improvement  = 0.11x
  min / max improvement   = 0.63x  /  1.07x
  95% CI of improvement   = [0.84x, 0.94x]
  mean old sched time     = 1798.8 us 
  mean new sched time     = 2048.55 us 
  max  old sched time     = 33.0 us 
  max  new sched time     = 52.0 us

Reference

#23 - Draft discussion
#36 - Infrustrue
#37 - Task state transitions APIs
ae35c84 - unit test test suite
11e9ee6 - benchmark

Summary by cubic

Complete O(1) scheduler with priority queues, bitmap lookup, and RR cursors, replacing the legacy O(n) scan. Adds an idle task and updates task lifecycle to use ready queues; up to ~2.7x faster under light load.

New Features
- Priority-indexed ready queues with O(1) highest-priority selection via bitmap + De Bruijn lookup.
- Per-priority round-robin cursors for fair rotation without list churn.
- Scheduler state in kcb (ready_bitmap, ready_queues[], rr_cursors[], queue_counts[]) and a dedicated idle task as the safe fallback.
- Intrusive ready-queue design: TCB embeds rq_node; helpers list_pushback_node() and list_remove_node() manage nodes safely.
- Unit tests validate bitmap/queue invariants; benchmarks show strong gains at low/moderate activity.
Refactors
- Tasks explicitly enqueue/dequeue on READY/RUNNING transitions (spawn, resume, wakeup, delay, block, suspend, cancel).
- Semaphore signal uses sched_wakeup_task() to reinsert tasks into ready queues.
- Priority changes migrate tasks between queues and yield if the running task changes its priority.
- Startup launches into the idle task (idle_task_init) and removes the IMAX scan limit.

^{Written for commit 0d8c856. Summary will update automatically on new commits.}

jserv · 2025-11-19T09:22:14Z

Do not include numbers in pull-request titles.

This commit extends the core scheduler data structures to support the new O(1) scheduler design. Adds in tcb_t: - rq_node: embedded list node for ready-queue membership used during task state transitions. This avoids redundant malloc/free for per-enqueue/dequeue nodes by tying the node's lifetime to the task control block. Adds in kcb_t: - ready_bitmap: 8-bit bitmap tracking which priority levels have runnable tasks. - ready_queues[]: per-priority ready queues for O(1) task selection. - queue_counts[]: per-priority runnable task counters used for bookkeeping and consistency checks. - rr_cursors[]: round-robin cursor per priority level to support fair selection within the same priority. These additions are structural only and prepare the scheduler for O(1) ready-queue operations; they do not change behavior yet.

When a task is enqueued into or dequeued from the ready queue, the bitmap that indicates the ready queue state should be updated. These three marcos can be used in mo_task_dequeue() and mo_task_enqueue() APIs to improve readability and maintain consistency.

This commit introduces two helper functions for intrusive list usage, where each task embeds its own list node instead of relying on per-operation malloc/free. The new APIs allow the scheduler to manipulate ready-queue nodes directly: - list_pushback_node(): append an existing node to the end of the list (before the tail sentinel) without allocating memory. - list_remove_node(): remove a node from the list without freeing it, allowing the caller to control the node's lifetime. These helpers will be used by the upcoming O(1) scheduler enqueue/dequeue paths, which require embedded list nodes stored in tcb_t.

This commit refactors sched_enqueue_task() and sched_dequeue_task() to use the per-priority ready queues and the embedded rq_node stored in tcb_t, instead of relying only on task state inspection. Tasks are now explicitly added to and removed from the appropriate ready queue, and queue_counts, rr_cursors, and the ready_bitmap are updated accordingly.

This commit introduces a new API, sched_migrate_task(), which enables migration of a task between ready queues of different priority levels. The function safely removes the task from its current ready queue and enqueues it into the target queue, updating the corresponding RR cursor and ready bitmap to maintain scheduler consistency. This helper will be used in mo_task_priority() and other task management routines that adjust task priority dynamically. Future improvement: The current enqueue path allocates a new list node for each task insertion based on its TCB pointer. In the future, this can be optimized by directly transferring or reusing the existing list node between ready queues, eliminating the need for an additional malloc() and free() operations during priority migrations.

This change refactors the priority update process in mo_task_priority() to include early-return checks and proper task migration handling. - Early-return conditions: * Prevent modification of the idle task. * Disallow assigning TASK_PRIO_IDLE to non-idle tasks. The idle task is created by idle_task_init() during system startup and must retain its fixed priority. - Task migration: If the priority-changed task resides in a ready queue (TASK_READY or TASK_RUNNING), sched_migrate_task() is called to move it to the queue corresponding to the new priority. - Running task behavior: When the current running task changes its own priority, it yields the CPU so the scheduler can dispatch the next highest-priority task.

This commit introduces the system idle task and its initialization API (idle_task_init()). The idle task serves as the default execution context when no other runnable tasks exist in the system. The sched_idle() function supports both preemptive and cooperative modes. In sched_t, a list node named task_idle is added to record the idle task sentinel. The idle task never enters any ready queue and its priority level cannot be changed. When idle_task_init() is called, the idle task is initialized as the first execution context. This eliminates the need for additional APIs in main() to set up the initial high-priority task during system launch. This design allows task priorities to be adjusted safely during app_main(), while keeping the scheduler’s entry point consistent.

When all ready queues are empty, the scheduler should switch to idle mode and wait for incoming interrupts. This commit introduces a dedicated helper to handle that transition, centralizing the logic and improving readbility of the scheduler path to idle.

Prepare for O(1) bitmap index lookup by adding a 32-entry De Bruijn sequence table. The table will be used in later commits to replace iterative bit scanning. No functional change in this patch.

Implement the helper function that uses a De Bruijn multiply-and-LUT approach to compute the index of the least-significant set bit in O(1) time complexity. This helper is not yet wired into the scheduler logic; integration will follow in a later commit. No functional change in this patch.

Previously, sched_wakeup_task() was limited to internal use within the scheduler module. This change makes it globally visible so that it can be reused in semaphore.c for task wake-up operations.

Previously, mo_sem_signal() only changed the awakened task state to TASK_READY when a semaphore signal was triggered. In the new scheduler design, which selects runnable tasks from ready queues, the awakened task must also be enqueued for scheduling. This change invokes sched_wakeup_task() to perform the enqueue operation, ensuring the awakened task is properly inserted into the ready queue.

Previously, mo_task_spawn() only created a task and appended it to the global task list (kcb->tasks), assigning the first task directly from the global list node. This change adds a call to sched_enqueue_task() within the critical section to enqueue the task into the ready queue and safely initialize its scheduling attributes. The first task assignment is now aligned with the RR cursor mechanism to ensure consistency with the O(1) scheduler.

Previously, the scheduler iterated through the global task list (kcb->tasks) to find the next TASK_READY task, resulting in O(N) selection time. This approach limited scalability and caused inconsistent task rotation under heavy load. The new scheduling process: 1. Check the ready bitmap and find the highest priority level. 2. Select the RR cursor node from the corresponding ready queue. 3. Advance the selected cursor node circularly. Why RR cursor instead of pop/enqueue rotation: - Fewer operations on the ready queue: compared to the pop/enqueue approach, which requires two function calls per switch, the RR cursor method only advances one pointer per scheduling cycle. - Cache friendly: always accesses the same cursor node, improving cache locality on hot paths. - Cycle deterministic: RR cursor design allows deterministic task rotation and enables potential future extensions such as cycle accounting or fairness-based algorithms. This change introduces a fully O(1) scheduler design based on per-priority ready queues and round-robin (RR) cursors. Each ready queue maintains its own cursor, allowing the scheduler to select the next runnable task in constant time.

Previously, when all ready queues were empty, the scheduler would trigger a kernel panic. This condition should instead transition into the idle task rather than panic. The new sched_switch_to_idle() helper centralizes this logic, making the path to idle clearer and more readable.

Replace the iterative bitmap scanning with the De Bruijn multiply+LUT method via the new helper. This change makes top-priority selection constant-time and deterministic.

The idle task is now initialized in main() during system startup. This ensures that the scheduler always has a valid execution context before any user or application tasks are created. Initializing the idle task early guarantees a safe fallback path when no runnable tasks exist and keeps the scheduler entry point consistent.

This change sets up the scheduler state during system startup by assigning kcb->task_current to kcb->harts->task_idle and dispatching to the idle task as the first execution context. This commit also keeps the scheduling entry path consistent between startup and runtime.

Previously, both mo_task_spawn() and idle_task_init() implicitly bound their created tasks to kcb->task_current as the first execution context. This behavior caused ambiguity with the scheduler, which is now responsible for determining the active task during system startup. This change removes the initial binding logic from both functions, allowing the startup process (main()) to explicitly assign kcb->task_current (typically to the idle task) during launch. This ensures a single, centralized initialization flow and improves the separation between task creation and scheduling control.

vicLin8712 mentioned this pull request Nov 19, 2025

O(1) scheduler #36

Draft

vicLin8712 requested a review from visitorckw November 19, 2025 07:06

jserv changed the title ~~[3/3] O(1) scheduler: Complete implementation~~ O(1) scheduler: Complete implementation Nov 19, 2025

vicLin8712 added 3 commits November 20, 2025 01:01

vicLin8712 force-pushed the o1-sched-lauch branch from 31a8c73 to b18ebac Compare November 19, 2025 18:15

vicLin8712 added 16 commits November 20, 2025 02:19

Add De Bruijn LUT for future O(1) priority selection

0b17e13

Prepare for O(1) bitmap index lookup by adding a 32-entry De Bruijn sequence table. The table will be used in later commits to replace iterative bit scanning. No functional change in this patch.

Make sched_wakeup_task() globally visible

75c93a9

Previously, sched_wakeup_task() was limited to internal use within the scheduler module. This change makes it globally visible so that it can be reused in semaphore.c for task wake-up operations.

Use De Brujin-based top priority helper in scheduler

37e6946

Replace the iterative bitmap scanning with the De Bruijn multiply+LUT method via the new helper. This change makes top-priority selection constant-time and deterministic.

vicLin8712 force-pushed the o1-sched-lauch branch from b18ebac to 0d8c856 Compare November 19, 2025 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

O(1) scheduler: Complete implementation #38

O(1) scheduler: Complete implementation #38

vicLin8712 commented Nov 19, 2025 •

edited by cubic-dev-ai bot

Loading

Uh oh!

jserv commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

O(1) scheduler: Complete implementation #38

Are you sure you want to change the base?

O(1) scheduler: Complete implementation #38

Conversation

vicLin8712 commented Nov 19, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!